Extended abstract: Learning search strategies
نویسنده
چکیده
The underlying motivation for the work presented in this paper is to usefully understand what is called the value of computation (Russell & Wefald 1991). By this intuitively mean the following. Suppose we have some computational process C which, at some point, has a choice between the computations co and cl. We would like to be able to make claims of the form: co is a better choice for C than cl because the value of co is greater than that of cl. Extending this idea by calling the set of choices made by C a program, we would similarly like to be able to say that the value of the program ~r = {ci } is greater than that of ~r’ = {e~}. This, in turn, would allow us to define the best, or bounded optimal (Russell & Subramanian 1995) program for C. Let us consider what would be required of a formalism to make these ideas meaningful. It is certainly necessary to first provide a context for the computation; i.e., we need to define what we want the computation to do. As an example, the classic framework of complexity has developed by considering the context of decision problems. Unfortunately, it is difficult to construct a definition for the value of computation within this framework which is ultimately non-trivial. The underlying cause of this difficulty is that there is a "correct answer" to a decision problem. This answer is independent of time, and is hence also independent of the computational process. At best, we can only discuss the amount of time (or some other resource) taken by the entire computational process which computes and outputs this answer. What we would like, instead, is to be able to consider the incremental process of computation and the tradeoffs between continuing and stopping the computation at any point in time. This idea is explored in the field of anytime algorithms (Dean & Boddy 1988; Zilberstein & Russell 1995). Unfortunately, a limitation of the framework of anytime algorithms is that the focus is still typically on some single, independent problem. It is assumed that this problem is given to the computational process when it is initialized, and the goal becomes that of characterizing the performance of the computation with respect to the time at which it is stopped/interrupted. What we would like, however, is to be able to express the problem itself as a process with which the computational process must interact; for example, we would like to be able to address the question of what to do when decision problems occur sequentially. Hence we find that the context of control problems is most suited for our discussion. The particular model of control problems which we consider is based on1 that of semi-Markov decision processes (SMDPs) (Parr 1998). It is well known SMDPs have optimal policies which are stationary 2.
منابع مشابه
The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach
Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...
متن کاملLearning Adaptation Strategies by Introspective Reasoning about Memory Search
In case-based reasoning systems, the case adaptation process is traditionally controlled by static libraries of hand-coded adaptation rules. This paper proposes a method for learning adaptation knowledge in the form of 6dapLaLioa strategies of the type developed and hand-coded by Kass [90] . Adaptation strategies differ from standard adaptation rules in that they encode general memory search pr...
متن کاملSearch Strategies as Synchronous Processes (Extended Abstract)
Solving constraint satisfaction problems (CSP) efficiently depends on the solver configuration and the search strategy. However, it is difficult to customize the constraint solvers because they are not modular enough, and it is hard to create new search strategies by composition. To solve these problems, we propose spacetime programming, a paradigm based on lattices and synchronous process calc...
متن کاملModel - based Direct Policy Search ( Extended Abstract ) Jan
Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.
متن کاملDesigning and Evaluating an Affective Information Literacy Game
The objective of this PhD research project is to examine the influence of EA’s affective expressions on students’ learning motivation, enjoyment and learning efficacy in an IL game. This project combines the concepts of digital game-based learning and affective embodied agents to information literacy education, by using Kuhlthau’s Information Search Process Model as a theoretical framework. In ...
متن کاملFactors related to academic failure in preclinical medical education: A systematic review
Introduction: Identifying the learners’ problems early enoughand providing advice from the beginning is definitely an importantinvestment in the training and progress of future practitioners. Thecurrent review aimed at examining factors related to academicfailure of the preclinical medical students.Methods: The study was carried out as a systematic search ofpublications in the following databas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999